After discussing the idea of abstraction in the categorical framework of [Rischel2020] in the previous notebooks, we now step back to introduce the notion of transformation presented in [Rubenstein2017], which is also used to define relations between causal models and discuss their consistency.
In this notebook we dive deep into the idea of transformation, relying on the formalism and the ideas we developed in the previous notebooks. We provide several examples both to clarify the meaning of transformations, and to set up an initial comparison between transformations and abstractions.
This notebook was developed in order to get a better understanding of the framework introduced in [Rubenstein2017], and to lay stonger foundations to further work with the idea of abstraction of causal models. The notebook is structured as follows:
DISCLAIMER 1: the notebook refers to ideas from causality and category theory for which only a quick definition is offered. Useful references for causality are [Pearl2009,Peters2017], while for category theory are [Spivak2014,Fong2018].
DISCLAIMER 2: mistakes are in all likelihood due to misunderstandings by the notebook author in reading [Rischel2020] and/or [Rubenstein2017]. Feedback very welcome! :)
In this notebook we will review the work presented in [Rubenstein2017] using mainly the notation adopted in [Rischel2020], or a notation derived from the formalism of [Rischel2020]. Although we will explain through the notebook most of the ideas and formulas, here, as reference, we provide a quick conversion table between the formalism of [Rubenstein2017] and [Rischel2020].
Models
Rubenstein [Rubenstein2017] | Rischel [Rischel2020] / Current notebook | Notes |
---|---|---|
SEM | SCM with set of interventions | Causal model |
$\mathcal{M}_X$ | $\mathcal{M}$ | Causal model, usually base or low-level model |
$X$ | $\mathcal{X}_\mathcal{M}$ | Set of (endogenous) variables |
$X_i$ | $X_i$ | (Endogenous) variable |
... | $\mathcal{M}[X_i]$ | Domain of a single variable $X_i$ |
$\mathcal{X}$ | $\prod_i\mathcal{M}[X_i]$ | Domain of all the endogenous variables |
$\mathcal{M}_Y$ | $\mathcal{M'}$ | Causal model, usually abstracted/transformed or high-level model |
$Y$ | $\mathcal{X}_\mathcal{M'}$ | Set of (endogenous) variables |
$Y_i$ | $X'_i$ | (Endogenous) variable |
... | $\mathcal{M'}[X'_i]$ | Domain of a single variable $X'_i$ |
$\mathcal{Y}$ | $\prod_i\mathcal{M'}[X'_i]$ | Domain of all the endogenous variables |
Interventions
Rubenstein [Rubenstein2017] | Rischel [Rischel2020] / Current notebook | Notes |
---|---|---|
$\mathcal{I}_X$ | $\mathcal{I}$ | Set of interventions of interest (poset) for the base model |
$do(i)$ | $\iota_i$ | Intervention for the base model |
$do(X_i=x_i)$ | $do(X_i=x_i)$ | Intervention for the base model |
$\mathcal{M}_X^{do(i)}$ | $\mathcal{M}_{\iota_i}$, $\mathcal{M}_{do}$ | Intervened model, usually base model after intervention |
$\mathcal{I}_Y$ | $\mathcal{I'}$ | Set of interventions of interest (poset) for the abstracted/transformed model |
$do(\omega(i))$ | $\iota'_i$ | Intervention for the abstracted/transformed model |
$do(Y_i=y_i)$ | $do(X'_i=x'_i)$ | Intervention for the abstracted/transformed model |
$\emptyset$ | $\emptyset$ | Null intervention |
$do(\emptyset)$ | $do(\emptyset)$ | Null intervention |
Distributions
Rubenstein [Rubenstein2017] | Rischel [Rischel2020] / Current notebook | Notes |
---|---|---|
$\mathbb{P}_{X}$ | $P_{\mathcal{M}}$ | Distribution, usually joint distribution of the base model |
$\mathbb{P}_{Y}$ | $P_{\mathcal{M'}}$ | Distribution, usually joint distribution of the abstracted/transformed model |
$\mathbb{P}_{X}^{do(i)}$ | $P_{\mathcal{M}_{\iota_i}}$, $P_{\mathcal{M}_{do}}$ | Distribution, usually joint distribution of the intervened base model |
$\mathbb{P}_{Y}^{do(i)}$ | $P_{\mathcal{M'}_{\iota_i}}$, $P_{\mathcal{M'}_{do}}$ | Distribution, usually joint distribution of the intervened abstracted/transformed model |
$\mathcal{P}_X$ | $\mathcal{P}_{\mathcal{M},\mathcal{I}}$ | Set of distributions generated from SCM $\mathcal{M}$ and set of intervention $\mathcal{I}$ (poset) |
Transformations
Rubenstein [Rubenstein2017] | Rischel [Rischel2020] / Current notebook | Notes |
---|---|---|
$\tau:\mathcal{X} \rightarrow \mathcal{Y}$ | $\tau:\prod \mathcal{M}[X_i] \rightarrow \prod \mathcal{M'}[X'_i]$ | Transformation |
$\tau (\mathbb{P}_X) = \mathbb{P}_{\tau({X})}$ | $\tau (P_\mathcal{M}) = P_{\tau(\mathcal{M})} = P_\mathcal{M'}$ | Joint distribution under transformation $\tau$ applied to $\mathcal{M}$. Joint distribution of $\mathcal{M'}$ |
$\tau (\mathbb{P}_X^{do(i)}) = \mathbb{P}_{\tau({X})}^i$ | $\tau (P_{\mathcal{M}_\iota}) = P_{\tau(\mathcal{M}_\iota)} $ | Joint distribution under transformation $\tau$ applied to intervened $\mathcal{M}_\iota$ |
$\mathcal{P}_{\tau(X)}$ | $\mathcal{P}_{\tau({\mathcal{M},\mathcal{I}})}$ | Set of distributions generated by applying the transformation $\tau$ to each distribution generated from SCM/SEM $\mathcal{M}$ and set of intervention $\mathcal{I}$ (poset) |
Notice that, in general, we will use the notation $\mathcal{M'}$ to refer to models generated from $\mathcal{M}$ using either a transformation or an abstraction; we will try to keep the wording consistent. The overall aim of the notebook is to analyze the overlap between abstractions and transformations.
We first review the basic ideas defined in [Rubenstein2017]. We recall our original definition of SCM and we compare it to SEM, and we give a precise definition of interventions.
Let us start having a look at the basic definition of causal models in order to align [Rubenstein2017] to [Rischel2020].
So far we have worked with causal models in the form of a SCM. Recall that a SCM is defined as a tuple:
$$\langle \mathcal{X},\mathcal{E},\mathcal{F},\mathcal{P} \rangle$$where:
Some typical properties and simplifications of SCMs are:
In [Rischel2020] the properties of structurality (1), measurability (2), acyclicity (3), finiteness (6) are required. Push-forwading of the exogenous variables (7) holds by virtue of (2). However, independent UEV (4) is not necessary, and so the SCMs under consideration are semi-Markovian instead of Markovian.
In [Rubenstein2017], a casual model is represented as a structural equation models (SEM) $\mathcal{M}$. In [Rubenstein2017](Definition 1), using the notation presented in the conversion table above, a SEM is defined as a tuple:
$$\langle \mathcal{F}, \mathcal{P}, \mathcal{I} \rangle$$defined over variables:
$$ \mathcal{X,E}$$where:
Notice how, in this definition, we provide a single joint domain for all the variables $\prod_i \mathcal{M}[X_i]$ instead of distinct domains $\mathcal{M}[X_i]$; the joint domain is expressed as the Cartesian product of all the individual domains $\mathcal{M}[X_i]$.
In [Rubenstein2017], a SEM is still endowed with the properties of structurality (1) and measurability (2). Acyclicity (3) is not required. Also, the condition of independent UEV (4) is relaxed so that exogenous variables do not have to be independent; this makes a SEM semi-Markovian instead of Markovian (5). Finiteness (6) is not stated either. Push-forwading of the exogenous variables (7) still holds by virtue of (2).
Despite the differences we can still find a workable intersection between SEM and SCM. In particular, we can restrict our attention to SEM that are acyclic (3), and are finite in the number of variables and in the cardinality of their domains (6). Such a SEM would match the definition of a (semi-Markovian) SCM.
SEMs satisfying these desiderata clearly forms a smaller subsets of all possible SEMs for which the results in [Rubenstein2017] hold. However, in this notebook our main concern is not to show the extent of transformations and their properties, but to try to align the idea of transformation with that of abstraction. We will then restrict ourselves to work with a well-behaved set of SEMs.
We can treat SEMs in this class as SCMs equipped with a set of perfect interventions of interest.
In the approach followed in [Rubenstein2017], a SEM (or a SCM equipped with a set of perfect intervention) deals explicitly with a specific set of interventions. Let us review interventions.
(Perfect) interventions allow an experimenter to set exactly a variable (or a set of variables) $X_i$ to a value (or set of values) $x_i$ through the do operator: $do(X_i = x_i)$.
Whenever an intervention $\iota$ is performed on a model $\mathcal{M}$, the underlying DAG and the corresponding joint distribution are changed. An intervention thus practically instantiates a new intervened model $\mathcal{M}_{\iota}$.
A set of interventions has a natural poset structure with respect to inclusion.
For instance, the set $\mathcal{I} = \{ do(X_{1}=x_{1}), do(X_{2}=x_{2}), do(X_{1}=x_{1}, X_2=x_2) \}$ has the following poset structure:
$$ \begin{array}{ccccc} & & do(X_{1}=x_{1},X_{2}=x_{2})\\ & \nearrow & & \nwarrow\\ do(X_{1}=x_{1}) & & & & do(X_{2}=x_{2})\\ & \nwarrow & & \nearrow\\ & & do(\emptyset) \end{array} $$where $do(\emptyset)$ is a null intervention. By transitivity, we also have the inclusion $do(\emptyset) \subset do(X_{1}=x_{1},X_{2}=x_{2})$ which is not explicitly drawn in the diagram.
Arrows in this diagram can be read as inclusions.
Since interventions restrict the (joint) domain of the variables of a model, a poset of interventions induces also a poset structure over the domain of variables with respect to a superset relationship.
For instance, in the previous case, if we name $\iota_1 = do(X_{1}=x_{1})$, $\iota_2 = do(X_{2}=x_{2})$ and $\iota_{1,2} = do(X_{1}=x_{1},X_{2}=x_{2})$, we would have the following structure:
$$ \begin{array}{ccccc} & & \prod_i \mathcal{M}_{\iota_{1,2}} [X_i]\\ & \nearrow & & \nwarrow\\ \prod_i \mathcal{M}_{\iota_1} [X_i] & & & & \prod_i \mathcal{M}_{\iota_2} [X_i]\\ & \nwarrow & & \nearrow\\ & & \prod_i \mathcal{M}[X_i] \end{array} $$Arrows in this diagram can be read as supersets (instead of subsets or inclusion). Alternatively, we can take the dual of this diagram (keeping the same objects but reversing the arrows) and read the arrows as inclusion.
A set of interventions also induces a structure over the instantiated causal models.
Again, with reference to the previous case, we would have the following structure:
$$ \begin{array}{ccccc} & & \mathcal{M}_{\iota_{1,2}}\\ & \nearrow & & \nwarrow\\ \mathcal{M}_{\iota_1} & & & & \mathcal{M}_{\iota_2}\\ & \nwarrow & & \nearrow\\ & & \mathcal{M} \end{array} $$where $\mathcal{M}$ is clearly the base model with no intervention (or null intervention).
Arrows in this diagram can be read as interventions.
As each SCM $\mathcal{M}$ has an associated joint distribution $P_{\mathcal{M}}$, a set of interventions also induces a structure over the joint distributions.
Again, reconsidering the previous case, we have:
$$ \begin{array}{ccccc} & & P_{\mathcal{M}_{\iota_{1,2}}}\\ & \nearrow & & \nwarrow\\ P_{\mathcal{M}_{\iota_1}} & & & & P_{\mathcal{M}_{\iota_2}}\\ & \nwarrow & & \nearrow\\ & & P_{\mathcal{M}} \end{array} $$We call this set of distributions $\mathcal{P}_{\mathcal{M,I}}$.
Arrows in this diagram can be read as measurable functions associated with interventions.
We can now discuss how causal models (SCM/SEM) may be related to each others via transformations.
SEMs (or SCMs equipped with a set of perfect interventions) may now be related to each other via simple transformations of (random) variables.
Given a model $\mathcal{M}$, a transformation $\tau$ is a function
$$\tau: \prod_i\mathcal{M}[X_i] \rightarrow \prod_i\mathcal{M'}[X'_i]$$that is, a function between (joint) sets, mapping every possible outcome of the variables in the model $\mathcal{M}$ to the domain of outcomes of the variables in a model $\mathcal{M'}$.
Notice that this transformation does not take into explicit account the details of the transformed model $\mathcal{M'}$ (like the definition of abstraction does); it is defined with respect to the set $\prod_i\mathcal{M}[X_i]$ of the base model $\mathcal{M}$ and the set $\prod_i\mathcal{M'}[X'_i]$ of a transformed model $\mathcal{M'}$. Indeed, in [Rubenstein2017], $\tau$ is referred to as a transformation between random variables, not between causal models.
Now, given the base model $\mathcal{M}$ with joint distribution $P_\mathcal{M}$, the transformation $\tau$ induces by pushforward the distribution $\tau(P_\mathcal{M}) = P_{\tau(\mathcal{M})} = P_\mathcal{M'}$, where the transformed model $\mathcal{M'}$ encodes this distribution given by the transformation $\tau$.
Notice, however, that a joint distribution admits, in general, many factorizations, so $P_{\tau(\mathcal{M})}$ does not pick out a uniquely defined SCM $\mathcal{M'}$.
Analogously, given the intervened joint distribution $P_{\mathcal{M}_\iota}$, the transformation $\tau$ induces by pushforward the distribution $\tau(P_{\mathcal{M}_\iota}) = P_{\tau(\mathcal{M_{\iota}})}$.
Notice that, by definition, there exists a SCM associated with the distribution $P_{\mathcal{M}_\iota}$ which may be derived from the base SCM $\mathcal{M}$: this is simply the SCM $\mathcal{M}_\iota$ produced from the base model $\mathcal{M}$ applying the intervention $\iota$.
$$ \begin{array}{ccc} P_{\mathcal{M}} & & P_{\mathcal{M}_{i}}\\ \downarrow & & \downarrow\\ \mathcal{M} & \overset{\iota}{\rightarrow} & \mathcal{M}_{\iota} \end{array} $$where the vertical line should be read as a joint distribution is expressed in the SCM.
Now, the same relation does not necessarily hold true for $P_{\tau(\mathcal{M_{\iota}})}$. Given $P_{\tau(\mathcal{M_{\iota}})}$, the distribution certainly exists and there exists a SCM $\mathcal{M'_\iota}$ encoding this distribution; however, once we have picked a transformed base model $\mathcal{M'}$, there may be no intervention $\iota'$ that applied to the transformed base model $\mathcal{M'}$ produces the SCM $\mathcal{M'_\iota}$:
$$ \begin{array}{ccc} \tau(P_{\mathcal{M}})=P_{\tau(\mathcal{M})} & & \tau(P_{\mathcal{M}_{\iota}})=P_{\tau(\mathcal{M}_{\iota})}\\ \downarrow & & \downarrow\\ \mathcal{M}' & \overset{\iota'}{{\color{darkgray}\rightarrow}} & \mathcal{M}'_{\iota'} \end{array} $$where the gray line for $\iota'$ highlights that such an intervention may not exist.
Applying $\tau$ to the set $\mathcal{P}_{\mathcal{M,I}}$ we obtained a new set of distributions $\mathcal{P}_{\tau(\mathcal{M,I})}$ preserving the original structure.
With reference to our example, we get:
$$ \begin{array}{ccccc} & & \tau(P_{\mathcal{M}_{\iota_{1,2}}})\\ & \nearrow & & \nwarrow\\ \tau(P_{\mathcal{M}_{\iota_1}}) & & & & \tau(P_{\mathcal{M}_{\iota_2}})\\ & \nwarrow & & \nearrow\\ & & \tau(P_{\mathcal{M}}) = P_{\mathcal{M'}} \end{array} $$or
$$ \begin{array}{ccccc} & & P_{\tau(\mathcal{M}_{\iota_{1,2}})}\\ & \nearrow & & \nwarrow\\ P_{\tau(\mathcal{M}_{\iota_1})} & & & & P_{\tau(\mathcal{M}_{\iota_2})}\\ & \nwarrow & & \nearrow\\ & & P_{\tau(\mathcal{M})} = P_{\mathcal{M'}} \end{array} $$Arrows in these diagrams can be read as measurable functions, which are not necessarily associated with interventions.
Notice that this structure is defined over transformed distributions $\tau(P_{\mathcal{M}_\iota})$, but the ordering is defined by the interventions $\iota$ which are applied to the base model $\mathcal{M}$. Indeed, $\iota$ refers to an intervention performed on $\mathcal{M}$ before the transformation, NOT to an intervention on a transformed model $\mathcal{M'}$! In other words, the indexing defined by $\iota$ is carried over from the base model, not intrinsic to the transformed model.
The reading of $\tau(P_{\mathcal{M}_{\iota}})$ or $P_{\tau(\mathcal{M}_{\iota})}$ implicitly assumes a order: this is the joint distribution of the transformation of an intervened model, not the joint distribution of the intervention on a transformed model. The notation and signature of the operators uphold this meaning: $\iota$ is an intervention defined to be applied to a base model $\mathcal{M}$; an intervention that is compatible and appliable to a transformed model would be designated by $\iota'$.
Moving to SCMs, applying $\tau$ implicitly bestows an order on models $\tau(\mathcal{M}_\iota)$ through the poset $\mathcal{P}_{\mathcal{M,I}}$.
With reference to our example, we get:
$$ \begin{array}{ccccc} & & \tau({\mathcal{M}_{\iota_{1,2}}})\\ & {\color{darkgray}\nearrow} & & {\color{darkgray}\nwarrow}\\ \tau({\mathcal{M}_{\iota_1}}) & & & & \tau({\mathcal{M}_{\iota_2}})\\ & {\color{darkgray}\nwarrow} & & {\color{darkgray}\nearrow}\\ & & \tau({\mathcal{M}}) = {\mathcal{M'}} \end{array} $$Arrows in these diagrams can be read as interventions whose existence is not guaranteed (hence, they are depicted in gray).
Notice that this structure is rooted in the transformed base model $\mathcal{M'}$, and that, as before, the ordering is defined by the interventions $\iota$ which are applied to the base model $\mathcal{M}$ not to $\mathcal{M'}$.
To summarize, a SCM $\mathcal{M}$ with a set of interventions $\mathcal{I}$ automatically defines structures: a poset structure over the interventions with arrows being inclusions (first and second column of the diagram below), a poset structure over joint domains with arrows being inclusion (third column), a poset structure over the intervened models with arrows being interventions (fourth column), and a poset structure over the intervened distributions with arrows being measurable functions associated with interventions (fifth column).
The definition of a transformation $\tau$ induces new structures by transferring them from the space of the base model $\mathcal{M}$ to the space of the transformed model $\mathcal{M'}$: a poset structure over the transformed intervened distributions with arrows being measurable functions associated with interventions in the base model (sixth and seventh column), and a poset structure over the transformed intervened models with arrows being interventions whose existence is not guaranteed (eighth column).
$$ \begin{array}{cccccc} \begin{array}{c} do(X_{1}=x_{1},X_{2}=x_{2})\\ \uparrow\\ do(X_{1}=x_{1})\\ \uparrow\\ do(\emptyset)\\ \\ \mathcal{I} \end{array}\quad & \begin{array}{c} \iota_{1,2}\\ \uparrow\\ \iota_{1}\\ \uparrow\\ \emptyset\\ \\ \mathcal{I} \end{array}\quad & \begin{array}{c} \prod_i \mathcal{M}_{\iota_{1,2}}[X_i]\\ \downarrow\\ \prod_i \mathcal{M}_{\iota_{1}}[X_i]\\ \downarrow\\ \prod_i \mathcal{M}[X_i]\\ \\ \\ \end{array}\quad & \begin{array}{c} \mathcal{M}_{\iota_{1,2}}\\ \uparrow\\ \mathcal{M}_{\iota_{1}}\\ \uparrow\\ \mathcal{M}\\ \\ \\ \end{array}\quad & \begin{array}{c} P_{\mathcal{M}_{\iota_{1,2}}}\\ \uparrow\\ P_{\mathcal{M}_{\iota_{1}}}\\ \uparrow\\ P_{\mathcal{M}}\\ \\ \mathcal{P}_{\mathcal{M},\mathcal{I}} \end{array}\quad & \begin{array}{c} \tau\left(P_{\mathcal{M}_{\iota_{1,2}}}\right)\\ \uparrow\\ \tau\left(P_{\mathcal{M}_{\iota_{1}}}\right)\\ \uparrow\\ \tau\left(P_{\mathcal{M}}\right)=P_{\mathcal{M}'}\\ \\ \mathcal{P}_{\tau\left(\mathcal{M},\mathcal{I}\right)} \end{array} & \quad\begin{array}{c} P_{\tau\left(\mathcal{M}_{\iota_{1,2}}\right)}\\ \uparrow\\ P_{\tau\left(\mathcal{M}_{\iota_{1}}\right)}\\ \uparrow\\ P_{\tau\left(\mathcal{M}\right)}=P_{\mathcal{M}'}\\ \\ \mathcal{P}_{\tau\left(\mathcal{M},\mathcal{I}\right)} \end{array}\end{array}\quad\begin{array}{c} \tau\left(\mathcal{M}_{\iota_{1,2}}\right)\\ {\color{darkgray}\uparrow}\\ \tau\left(\mathcal{M}_{\iota_{1}}\right)\\ {\color{darkgray}\uparrow}\\ \mathcal{M}'\\ \\ \\ \end{array} $$We now go through an example to help illuminating the meaning of a transformation between SEMs and the limitation we discussed above.
We start considering our seasoned example that we analyzed in previous notebooks, and we show how the idea of abstraction may be expressed through transformation.
Let $\mathcal{M}$ be defined over three binary variables $\mathcal{X}_\mathcal{M} = \{S,T,C\}$ encoding whether a patient is a smoker (S), whether he/she developed tar deposits in the lungs (T), and whether he/she developed lung cancer. The DAG for our model is the simple chain: $S \rightarrow T \rightarrow C$.
We use the same mechanisms we defined previously:
\frac{4}{5} & \frac{1}{5} \end{array}\right]$
1 & 0\ \frac{1}{5} & \frac{4}{5} \end{array}\right]$
\frac{9}{10} & \frac{1}{10}\ \frac{3}{5} & \frac{2}{5} \end{array}\right]$
We can now compute the joint domain $\mathcal{M}[S] \times \mathcal{M}[T] \times \mathcal{M}[C] = \{0,1\}^3$ and the joint distribution over it:
A SEM is a SCM with a set of interventions of interest. So far we have set up our SCM $\mathcal{M}$ but we have not specified what set of interventions $\mathcal{I}$ we are interested in. Notice that, if we were not to give any further detail, this set may be trivially taken to be $\mathcal{I}=\{do(\emptyset)\}$; in other words, we would have only the null intervention, leaving us with the base model $\mathcal{M}$.
We will take into consideration one potential intervention of interest: forcing the patient not to smoke; this translates into the intervention $\iota_0 = do(S=0)$, the setting of the variable $S$ to $0$. Thus $\mathcal{I}=\{\emptyset, \iota_0\}$. This induces the following basic structures:
$$ \begin{array}{cccc} \begin{array}{c} do(S=0)\\ \uparrow\\ do(\emptyset) \end{array}\quad & \begin{array}{c} \iota_{0}\\ \uparrow\\ \emptyset \end{array}\quad & \begin{array}{c} \{0\} \times \{0,1\} \times \{0,1\}\\ \downarrow\\ \{0,1\} \times \{0,1\} \times \{0,1\} \end{array}\quad & \begin{array}{c} \mathcal{M}_{\iota_{0}}\\ \uparrow\\ \mathcal{M} \end{array}\quad & \begin{array}{c} P_{\mathcal{M}_{\iota_{0}}}\\ \uparrow\\ P_{\mathcal{M}} \end{array}\end{array} $$where the first three vertical arrows read as inclusion, while the last two vertical arrows read as intervention.
Let us now evaluate the model $\mathcal{M}_{\iota_0}$ in the above diagram, that is, the model generated by applying the intervention $\iota_0$ to $\mathcal{M}$.
The intervention $\iota_0 = do(S=0)$ affects only a single node, by forcing its value to be $0$. Applying the intervention to $\mathcal{M}$ requires to delete all the incoming edges in the intervened node, and then to set deterministically its value. Since the node $S$ is a root node with no incoming edges, the underlying DAG is left unchanged. However, the stochastic matrices must be updated:
1 & 0 \end{array}\right]$
1 & 0\ \frac{1}{5} & \frac{4}{5} \end{array}\right]$
\frac{9}{10} & \frac{1}{10}\ \frac{3}{5} & \frac{2}{5} \end{array}\right]$
Only the first stochastic matrix $\mathcal{M_{\iota_0}}[\phi_S]$ changed from the original $\mathcal{M}[\phi_S]$ in order to account for its now-deterministic output; the other stochastic matrices are left unchanged.
Modifying the stochastic matrices affects, of course, our joint distribution. The joint domain $\mathcal{M}_{\iota_0}[S] \times \mathcal{M}_{\iota_0}[T] \times \mathcal{M}_{\iota_0}[C]$ is $\{0\} \times \{0,1\} \times \{0,1\}$, which is isomorphic to $\{0,1\}^2$. The joint distribution is:
which we calculated over $\{0,1\}^3$ for comparing it with the previous joint, but which could live in $\{0,1\}^2$ since the last four lines (for $S=1$) are meaningless.
We now want to implement a transformation $\tau: \{0,1\}^3 \rightarrow \{0,1\}^2$, that is a mapping of the outcomes of our three variables in the the base model $\mathcal{M}$ to the outcomes of the two variables in a transformed model.
We can derive such a transformation from the definition of abstraction $(R,a,\alpha)$ we used in the notebook Categorical Abstraction. The mapping $a: R \rightarrow \mathcal{X}_\mathcal{M'}$ was given by an identity, and each map $\alpha_{X'}: \mathcal{M}[a^{-1}(X')] \rightarrow \mathcal{M'}[X']$ was also an identity. Practically, the abstraction says that the outcomes of relevant variables are mapped to identical outcomes of abstracted variables. It is the easy to work out the form of an equivalent transformation $\tau$:
Notice that what the mapping $\tau$ does is simply to projecting out variable $T$: given values $s,c \in \{0,1\}$ for the variables $S$ and $C$, then for all value of $t \in \{0,1\}$ we have $\tau(s,t,c)=(s,c)$. Thus $\tau$ ignore the value of $T$ and maps just the value of $S$ and $C$. This agrees with the definition of abstraction which excluded $T$ as a non-relevant variable.
Let us now consider the transformed model $\mathcal{M'}$. Instead of relying on the given definition of $\mathcal{M'}$ that we used in the previous notebook, we will derive its joint distribution and work out a DAG with the associated stochastic matrices.
The transformation $\tau$ induces a pushforward of the distribution $P_{\mathcal{M}}(S,T,C)$ onto our new joint distribution of interest $P_{\tau(\mathcal{M})}(S,C)$. We can compute the new distribution by marginalizing out $T$:
Let us now define the DAG underlying our transformed causal model as: $S \rightarrow C$, that is a chain in which the mediating node $T$ has been removed. Contrastew with the example in the notebook Categorical Abstraction where the DAG was given, here we are making an arbitrary choice; in other words we are not given a DAG by the problem, but we conjure one up in order to solve our problem. The choice of the form of the DAG is of course important, as it amounts to deciding how to factorize the joint. Not all factorizations/DAGs may be compatible with the joint we have.
Next we want to find the stochastic matrices that are compatible with our DAG and joint. To do this, we consider the factorizations implied by our DAG and we set out to solve the system:
$$ \begin{cases} P_{\tau(\mathcal{M})}(S=0)P_{\tau(\mathcal{M})}(C=0\vert S=0)=\frac{18}{25}\\ P_{\tau(\mathcal{M})}(S=0)P_{\tau(\mathcal{M})}(C=1\vert S=0)=\frac{2}{25}\\ P_{\tau(\mathcal{M})}(S=1)P_{\tau(\mathcal{M})}(C=0\vert S=1)=\frac{33}{250}\\ P_{\tau(\mathcal{M})}(S=1)P_{\tau(\mathcal{M})}(C=1\vert S=1)=\frac{17}{250} \end{cases} $$Relying on the fact that probabilities sum up to $1$, we can rewrite:
$$ \begin{cases} P_{\tau(\mathcal{M})}(S=0)P_{\tau(\mathcal{M})}(C=0\vert S=0)=\frac{18}{25}\\ P_{\tau(\mathcal{M})}(S=0)\left(1-P_{\tau(\mathcal{M})}(C=0\vert S=0)\right)=\frac{2}{25}\\ \left(1-P_{\tau(\mathcal{M})}(S=0)\right)P_{\tau(\mathcal{M})}(C=0\vert S=1)=\frac{33}{250}\\ \left(1-P_{\tau(\mathcal{M})}(S=0)\right)\left(1-P_{\tau(\mathcal{M})}(C=0\vert S=1)\right)=\frac{17}{250} \end{cases} $$Substituting for readability, we have to solve the following (non-linear) system:
$$ \begin{cases} ab=\frac{18}{25}\\ a\left(1-b\right)=\frac{2}{25}\\ \left(1-a\right)c=\frac{33}{250}\\ \left(1-a\right)\left(1-c\right)=\frac{17}{250} \end{cases} $$made up of four equations in three unknowns ($a,b,c$). Few algebraic manipulations lead us to the solution:
$$ \begin{cases} a=\frac{4}{5}\\ b=\frac{9}{10}\\ c=\frac{33}{50} \end{cases} $$The value of $a$ represents $P_{\tau(\mathcal{M})}(S=0)$, and it allows us to determine $\mathcal{M'}[\phi_{S}]$; the values of $b,c$ represent respectively $P_{\tau(\mathcal{M})}(C=0\vert S=0)$ and $P_{\tau(\mathcal{M})}(C=0\vert S=1)$, which allow us to define $\mathcal{M'}[\phi_{C}]$:
\frac{4}{5} & \frac{1}{5} \end{array}\right]$
\frac{9}{10} & \frac{1}{10}\ \frac{33}{50} & \frac{17}{50} \end{array}\right]$
Notice that, in general, the form of the system of equations we have to solve will depend on the way in which the joint distribution factorizes; in other words, the number of variables and the relations expressed in the DAG will define the set of equations we have to solve.
Now, the new stochastic matrices $\mathcal{M'}[\phi_{S}]$ and $\mathcal{M'}[\phi_{C}]$ that we computed correspond indeed to the stochastic matrices that were defined in the previous example in the notebook Categorical Abstraction.ipynb. So our previous abstraction expressed in terms of $(R,a,\alpha_{X'})$ has been sucessfully translated into a transformation $\tau$.
We now apply the same transformation $\tau$ to the intervened model $\mathcal{M}_{\iota_0}$, and we derive its distribution $P_{\tau(\mathcal{M}_{\iota_0})}$.
Let us compute the pushforward of the distribution $P_{\mathcal{M}_{\iota_0}}(S,T,C)$ via $\tau$:
Next we need to specify how our joint distribution $P_{\tau(\mathcal{M}_{\iota_0})}(S,C)$ factorizes. We assume the underlying DAG to be the same as the DAG of the transformed base model $\mathcal{M'}$, that is $S \rightarrow C$. We only need to find proper stochastic matrices compatible with the DAG and the joint. One possible solution is:
1 & 0 \end{array}\right]$
\frac{9}{10} & \frac{1}{10}\ x & (1-x) \end{array}\right]$
where $x\in [0,1]$. We then have actually an infinite number of models compatible with the joint $P_{\tau(\mathcal{M}_{\iota_0})}(S,C)$. Intuitively this is due to the fact that the conditional $P_{\tau(\mathcal{M_{\iota_0}})}(C\vert S=1)$ can assume any value since the probability of $S=1$ is zero.
We have now computed the transformed distribution $P_\tau(\mathcal{M})$ and $P_\tau(\mathcal{M_{\iota_0}})$, as well as the associated models $\mathcal{M'}$ and $\tau(\mathcal{M_{\iota_0}})$. We can then show their relationship in the following diagram:
$$ \begin{array}{ccc} \begin{array}{c} \tau\left(P_{\mathcal{M}_{\iota_{0}}}\right)\\ \uparrow\\ \tau\left(P_{\mathcal{M}}\right)=P_{\mathcal{M}'} \end{array}\quad & \begin{array}{c} P_{\tau\left(\mathcal{M}_{\iota_{0}}\right)}\\ \uparrow\\ P_{\tau\left(\mathcal{M}\right)}=P_{\mathcal{M}'} \end{array}\quad & \begin{array}{c} \tau\left(\mathcal{M}_{\iota_{0}}\right)\\ {\color{darkgray}\uparrow}\\ \mathcal{M}' \end{array}\end{array} $$where the first two vertical arrows read as measurable function, while the last gray one read as a questionable intervention.
The question is: is there a perfect intervention $\iota'_0$ that, applied to $\mathcal{M'}$, would generate $\tau(\mathcal{M_{\iota_0}})$?
First we must notice that ${\tau(\mathcal{M_{\iota_0}})}$ is still underspecified, since it is parametrized by $x$. We then use the constraint introduced by the structure above to set the value $x$ to $\frac{33}{50}$. Now the model ${\tau(\mathcal{M_{\iota_0}})}$ is specified by the following stochastic matrices:
1 & 0 \end{array}\right]$
\frac{9}{10} & \frac{1}{10}\ \frac{33}{50} & \frac{17}{50} \end{array}\right]$
It is easy to see that there exist an intervention $\iota'_0$ on $\mathcal{M'}$ that would generate the model $\tau(\mathcal{M}_{\iota_0})$. This is simply $\iota'_0 = do(S=0)$, the intervention that sets the value of the smoking variable $S$ in $\mathcal{M'}$ to $0$. So the final diagram for the models in the space of transformed models may be written as:
$$ \begin{array}{c} \tau\left(\mathcal{M}_{\iota_{0}}\right)\\ \uparrow\\ \mathcal{M}' \end{array} $$with a solid black line.
We now look at a counterexample, a case in which we define a SCM $\mathcal{M}$, a set of interventions $\mathcal{I}$, a transformation $\tau$, and a transformed model $\mathcal{M'}$, and we then conclude that there are no interventions on $\mathcal{M'}$ consistent with the transformation of the intervened models.
Let us consider a simple model $\mathcal{M}$ defined over two binary variables $\mathcal{X}_\mathcal{M} = \{A,B\}$. Let us specify the mechanisms through their stochastic matrices:
\frac{1}{2} & \frac{1}{2} \end{array}\right]$
\frac{1}{3} & \frac{2}{3}\ \frac{1}{100} & \frac{99}{100} \end{array}\right]$
This mechanisms imply that the DAG of our model is simply: $A \rightarrow B$.
Let us now consider the joint domain $\mathcal{M}[A] \times \mathcal{M}[B] = \{0,1\}^2$ of all the variables; we can immeditale compute its joint distribution:
In this example, we will consider one intervention of interest, that is $\iota_0 = do(A=0)$, the setting of the variable $A$ to the value $0$. Thus $\mathcal{I}=\{\emptyset, \iota_0\}$. This induces the following basic structures:
$$ \begin{array}{cccc} \begin{array}{c} do(A=0)\\ \uparrow\\ do(\emptyset) \end{array}\quad & \begin{array}{c} \iota_{0}\\ \uparrow\\ \emptyset \end{array}\quad & \begin{array}{c} \{0\}\times\{0,1\}\\ \downarrow\\ \{0,1\}\times\{0,1\} \end{array}\quad & \begin{array}{c} \mathcal{M}_{\iota_{0}}\\ \uparrow\\ \mathcal{M} \end{array}\quad & \begin{array}{c} P_{\mathcal{M}_{\iota_{0}}}\\ \uparrow\\ P_{\mathcal{M}} \end{array}\end{array} $$where the first three vertical arrows read as inclusion, while the last two vertical arrows read as intervention.
Notice that, except for the variable number and names, these structures are the same as in the previous example.
Let us now evaluate the model $\mathcal{M}_{\iota_0}$ in the above diagram, that is, the model generated by applying the intervention $\iota_0$ to $\mathcal{M}$.
The intervention $\iota_0 = do(A=0)$ affects the stochastic matrices as follows:
1 & 0 \end{array}\right]$
\frac{1}{3} & \frac{2}{3}\ \frac{1}{100} & \frac{99}{100} \end{array}\right]$
We changed the old $\mathcal{M}[\phi_A]$ to the new $\mathcal{M_{\iota_0}}[\phi_A]$ to make it output deterministically $0$, while $\mathcal{M_{\iota_0}}[\phi_B]$ is unchanged from $\mathcal{M}[\phi_B]$.
The joint domain ${\mathcal{M_{\iota_0}}}[A] \times {\mathcal{M_{\iota_0}}}[B]$ is $\{0\} \times \{0,1\}$, isomorphic to $\{0,1\}$, but we will still use $\{0,1\}^2$ to computer the joint distribution for the sake of comparison:
Let us now implement a transformation $\tau: \{0,1\}^2 \rightarrow \{0,1\}^2$. The domain is of course given by the base model $\mathcal{M}$, while in choosing the codomain we implicitly state that the transformed model will be defined over two binary variables.
We consider the following $\tau:$
Let us now discuss the transformed model $\mathcal{M'}$. The transformation $\tau$ specifies the set of variables of $\mathcal{M'}$, their domains and the joint distribution over them.
Specifically, we have that the model $\mathcal{M'}$ must be defined over two binary variables, so we will express $\mathcal{X}_\mathcal{M'} = \{A',B'\}$.
Moreover the pushforward of the distribution $P_{\mathcal{M}}(A,B)$ via $\tau$ induces the following distribution:
We are then left with specifying how our joint distribution $P_{\tau(\mathcal{M})}(A',B')$ factorizes into our new SCM. We start choosing the DAG underlying our causal model, and we make it similar to the base model: $A' \rightarrow B'$. Next we formulate the stochastic matrices that are compatible with this DAG and the joint:
\frac{1}{6} & \frac{5}{6} \end{array}\right]$
1 & 0\ 0 & 1 \end{array}\right]$
Remember that this computation is the same as specifying the distributions $P_{\tau(\mathcal{M})}(A')$ and $P_{\tau(\mathcal{M})}(B'\vert A')$ in which the joint $P_{\tau(\mathcal{M})}(A',B')$ factorizes.
We have now completely specified the model $\mathcal{M'}$ onto which the transformation $\tau$ maps $\mathcal{M}$, and we can claim the relabeling $P_{\tau(\mathcal{M})} = P_{\mathcal{M'}}$.
We now apply the same transformation $\tau$ to the intervened model $\mathcal{M}_{\iota_0}$, and we derive its distribution $P_{\tau(\mathcal{M}_{\iota_0})}$.
Let us compute the pushforward of the distribution $P_{\mathcal{M}_{\iota_0}}(A,B)$ via $\tau$:
Next we need to specify how our joint distribution $P_{\tau(\mathcal{M}_{\iota_0})}(A',B')$ factorizes. In this case we are already given an underlying DAG for our SCM, that is, the DAG of the transformed base model $\mathcal{M'}$: $A' \rightarrow B'$. We only need to find proper stochastic matrices compatible with the DAG and the joint:
\frac{1}{3} & \frac{2}{3} \end{array}\right]$
1 & 0\ 0 & 1 \end{array}\right]$
We have now computed the transformed distribution $P_\tau(\mathcal{M})$ and $P_\tau(\mathcal{M_{\iota_0}})$, as well as the associated models $\mathcal{M'}$ and $\tau(\mathcal{M_{\iota_0}})$. We can then show their relationship in the following diagram:
$$ \begin{array}{ccc} \begin{array}{c} \tau\left(P_{\mathcal{M}_{\iota_{0}}}\right)\\ \uparrow\\ \tau\left(P_{\mathcal{M}}\right)=P_{\mathcal{M}'} \end{array}\quad & \begin{array}{c} P_{\tau\left(\mathcal{M}_{\iota_{0}}\right)}\\ \uparrow\\ P_{\tau\left(\mathcal{M}\right)}=P_{\mathcal{M}'} \end{array}\quad & \begin{array}{c} \tau\left(\mathcal{M}_{\iota_{0}}\right)\\ {\color{darkgray}\uparrow}\\ \mathcal{M}' \end{array}\end{array} $$where the first two vertical arrows read as measurable function, while the last gray one read as a questionable intervention.
The question is: is there a perfect intervention $\iota'_0$ that, applied to $\mathcal{M'}$, would generate $\tau(\mathcal{M_{\iota_0}})$?
It is indeed easy to see that such a perfect intervention $\iota'_0$ does not exist. Starting from the SCM $\mathcal{M'}$ whose outputs are $P_{(\mathcal{M'}}(A'=0,B'=0) = \frac{1}{6}$ and $P_{\mathcal{M'}}(A'=1,B'=1) = \frac{5}{6}$, there is no way to intervene on this model by fixing the value of one of its variables, such that the final probabilities would be $P_{(\mathcal{M'}_{\iota'_0}}(A'=0,B'=0) = \frac{1}{3}$ and $P_{\mathcal{M'}_{\iota'_0}}(A'=1,B'=1) = \frac{2}{3}$.
In conclusion, the final diagram for the models in the space of transformed models must be written as:
$$ \begin{array}{c} \tau\left(\mathcal{M}_{\iota_{0}}\right)\\ \\ \mathcal{M}' \end{array} $$with no arrow.
A transformation $\tau$ induces a relationship between a base SCM $\mathcal{M}$ and a transformed SCM $\mathcal{M'}$ via a mapping of the joint domain of their variables $\prod_i \mathcal{M}[X_i] \rightarrow \prod_i \mathcal{M'}[X'_i]$.
Given a SCM $\mathcal{M}$ and a transformation $\tau$:
Considering an intervention $\iota \in \mathcal{I}$:
As we have seen in the two examples above, a transformation $\tau$ may sometime generate acceptable transformed intervened models, while some other times it may fail. More precisely, let $\mathcal{M}$ be a SCM with a set of interventions of interest $\mathcal{I}$, and let $\tau$ be a transformation. If we consider the transformed base model $\tau(\mathcal{M}) = \mathcal{M'}$ and a transformed intervened model $\tau(\mathcal{M}_\iota)$, for $\iota \in \mathcal{I}$, it is not guaranteed that there exist an intervention $\iota'$ such that $\mathcal{M'}_{\iota'} = \tau(\mathcal{M}_\iota)$. The existence of such an intervention undergirds the definition of an exact transformation.
We now provide the definition of exact transformation from [Rubenstein2017].
Let $\mathcal{M}$ be a SCM with a set of interventions of interest $\mathcal{I}$, and let $\mathcal{M'}$ be another SCM. Let $\tau: \prod_i \mathcal{M}[X_i] \rightarrow \prod_i \mathcal{M'}[X'_i]$ be a transformation between the models. Then $\mathcal{M'}$ is an exact $\tau$-transformation of $\mathcal{M}$ if there exists a surjective order-preserving map $\omega: \mathcal{I} \rightarrow \mathcal{I'}$ such that $\tau(P_{\mathcal{M}_\iota}) = P_{\mathcal{M'}_{\omega(\iota)}}$, for all $\iota \in \mathcal{I}$.
In other words, for all the interventions $\iota \in \mathcal{I}$ applied to the base model, there exist an intervention $\omega(\iota) = \iota' \in \mathcal{I'}$ applied to the transformed intervened model that preserves the poset structure of interventions in $\mathcal{I}$.
The definition of an exact transformation is grounded on the existence of a map $\omega: \mathcal{I} \rightarrow \mathcal{I'}$ with some important properties: (i) it is a function; (ii) it is surjective; (iii) it is order-preserving. This has the following implications:
Two important properties of exact transformations are proved in [Rubenstein2017]:
Transitivity guarantees us that exact transformations can be safely composed together and return a new exact transformation.
where, as by the above convention, $\omega(\iota_i)=\iota'_i$ and $\omega(\iota_j)=\iota'_j$.
Consistency guarantee that transformations and interventions may commute and lead to the same joint distribution.
A simple example of an exact transformation is the Example (I) above (indeed, it was labelled good because it met beforehand the specification of an exact transformation!)
Recall that in the example we had:
It is easy to show that there exists a map $\omega: \mathcal{I} \rightarrow \mathcal{I'}$ defined as:
Taking $\mathcal{I'} = \{ \emptyset, \iota'_0 = do(S=0)\}$, then $\omega$ is surjective and preserve the poset structure of the set of interventions:
$$ \begin{array}{c} \mathcal{\iota}_{0}\\ \sideset{}{}\uparrow\\ \emptyset \end{array}\qquad\qquad\begin{array}{c} \iota'_{0}\\ \sideset{}{}\uparrow\\ \emptyset \end{array} $$Thus, this is an exact transformation.
Let us now consider the pathological case of a transformation for which we have a non-order-preserving $\omega$. We will rely on Example 7 in [Rubenstein2017].
Let us consider a model $\mathcal{M}$ defined over three variables $\mathcal{X}_\mathcal{M} = \{X_1, X_2, X_3\}$. The original example considers the variables defined over the real numbers; here, to comply with the assumption of finiteness, and without affecting the point of the example, we will take the variables to be defined on a discrete domain $\mathbb{D} = \{1,2,...,D\}$; this may be imagined as the digital discretization of a continuous variable happening on a computer.
Next, we instatiate the mechanisms through their stochastic matrices. Again, to comply with our assumption of independent UEV, we force the exogenous variables to be independent without changing the nature of the example. We then have the following stochastic matrices:
The DAG implied by the mechanisms in our model is:
$$ \begin{array}{ccc} X_{1} & \rightarrow & X_{2}\\ \searrow & & \swarrow\\ & X_{3} \end{array} $$Notice that, although our formalism of stochastic matrices does not make it very evident, the contributions of $X_1$ and $X_2$ in the node $X_3$ elide themselves, leaving only $X_3 \sim P_{X_3}$.
We take as interventions of interest $\mathcal{I}=\{\emptyset, \iota_0 = do(X_2=0), \iota_1 = do(X_1=0,X_2=0) \}$. This induces the following basic structures:
$$ \begin{array}{cccc} \begin{array}{c} do(X_{1}=0,X_{2}=0)\\ \uparrow\\ do(X_{2}=0)\\ \uparrow\\ do(\emptyset) \end{array}\quad & \begin{array}{c} \iota_{1}\\ \uparrow\\ \iota_{0}\\ \uparrow\\ \emptyset \end{array}\quad & \begin{array}{c} \{0\}\times\{0\}\times\mathbb{D}\\ \downarrow\\ \{0\}\times\mathbb{D}^2\\ \downarrow\\ \mathbb{D}^3 \end{array}\quad & \begin{array}{c} \mathcal{M}_{\iota_{1}}\\ \uparrow\\ \mathcal{M}_{\iota_{0}}\\ \uparrow\\ \mathcal{M} \end{array}\quad & \begin{array}{c} P_{\mathcal{M}_{\iota_{1}}}\\ \uparrow\\ P_{\mathcal{M}_{\iota_{0}}}\\ \uparrow\\ P_{\mathcal{M}} \end{array}\end{array} $$Let us now implement a transformation $\tau: \mathbb{D}^3 \rightarrow \mathbb{D}^2$, that is from the model $\mathcal{M}$ defined over three discrete variables to a transformed model $\mathcal{M'}$ defined over two discrete variables. We define $\tau$ as:
Our model $\mathcal{M'}$ for our transformation will be defined over two variables $\mathcal{X}_\mathcal{M'} = \{X'_1, X'_2\}$ each one with a discrete domain $\mathbb{D} = \{1,2,...,D\}$.
The two mechanisms are defined as follows:
The simplified DAG for the transformed model is:
$$ \begin{array}{c} X'_{1}\\ \downarrow\\ X'_{2} \end{array} $$Notice that, under transformation $\tau$, the value of $X'_1 = X_1 + X_2$ is identically zero by definition, while the value of $X'_3 = X_3$ is preserved from the base model to the transformed model.
Let us now institute the following mapping $\omega: \mathcal{I} \rightarrow \mathcal{I'}$ from the set of interventions on the base model to the set of interventions on the transformed model:
Surjectivity. The set of interventions on the transformed model is taken to be $\mathcal{I'}=\{ \emptyset, do(X'_1=0) \}$. Thus, trivially, $\omega$ is surjective.
Commutativity. Moreover, we want to show that the joint distribution $\tau(P_\mathcal{M_\iota})$ of the transformed intervened models is the same as the joint distribution $P_\mathcal{M'_{\iota'}}$ of the intervened transformed models. We will show this in all the three cases considered above.
Let us consider the intervention $\emptyset \in \mathcal{I}$ and $do(X'_1=0) \in \mathcal{I'}$:
First, let us intervene and then transform:
$$\begin{array}{c} X_1 \sim P_{X_1}\\ X_2 = -X_1\\ X_3 \sim P_{X_3} \end{array}$$
$$\begin{array}{c} X'_1 = 0\\ X'_2 \sim P_{X_3} \end{array}$$
Second, let us transform and then intervene:
$$\begin{array}{c} X'_1 \sim P_{X_1}\\ X'_2 \sim X'_1 + P_{X_3} \end{array}$$
$$\begin{array}{c} X'_1 = 0\\ X'_2 \sim P_{X_3} \end{array}$$
Thus, the final distributions are the same independently of the order of transformation and intervention.
Let us consider the intervention $do(X_2=0) \in \mathcal{I}$ and $\emptyset \in \mathcal{I'}$:
First, let us intervene and then transform:
$$\begin{array}{c} X_1 \sim P_{X_1}\\ X_2 = 0\\ X_3 \sim X_1 + P_{X_3} \end{array}$$
$$\begin{array}{c} X'_1 \sim P_{X_1}\\ X'_2 \sim X_1 + P_{X_3} \end{array}$$
Second, let us transform and then intervene:
$$\begin{array}{c} X'_1 \sim P_{X_1}\\ X'_2 \sim X'_1 + P_{X_3} \end{array}$$
$$\begin{array}{c} X'_1 \sim P_{X_1}\\ X'_2 \sim X'_1 + P_{X_3} \end{array}$$
Thus, the final distributions are the same independently of the order of transformation and intervention.
Finally, let us consider the intervention $do(X_1=0,X_2=0) \in \mathcal{I}$ and $do(X'_1=0) \in \mathcal{I'}$:
First, let us intervene and then transform:
$$\begin{array}{c} X_1 = 0\\ X_2 = 0\\ X_3 \sim P_{X_3} \end{array}$$
$$\begin{array}{c} X'_1 = 0\\ X'_2 \sim P_{X_3} \end{array}$$
Second, let us transform and then intervene:
$$\begin{array}{c} X'_1 \sim P_{X_1}\\ X'_2 \sim X'_1 + P_{X_3} \end{array}$$
$$\begin{array}{c} X'_1 = 0\\ X'_2 \sim P_{X_3} \end{array}$$
Thus, the final distributions are the same independently of the order of transformation and intervention.
Order-preservation. Unfortunately, though, the $\omega$ mapping is not order-preserving. Let us take a look at the posets of interventions $\mathcal{I}$ (on the left) and $\mathcal{I'}$ (on the right):
$$ \begin{array}{ccc} \begin{array}{c} do(X_{1}=0,X_{2}=0)\\ \uparrow\\ do(X_{2}=0)\\ \uparrow\\ do(\emptyset) \end{array} & \quad & \begin{array}{c} \\ do(X'_{1}=0)\\ \uparrow\\ do(\emptyset)\\ \\ \end{array}\end{array} $$It is easy to see that order is not preserved, as $\omega$ maps $do(\emptyset)$ to $do(X'_{1}=0)$, and $do(X_{2}=0)$ to $do(\emptyset)$, thus crossing their order.
In conclusion, then, the transformation $\tau$ we considered is not exact, as the associated mapping $\omega$ fails in preserving the order of the set of interventions of interest.
Example 8 in [Rubenstein2017] further illustrates that just mapping null-interventions in $\mathcal{I}$ to null-interventions in $\mathcal{I'}$ is not sufficient to have exact transformations. We do need order-preservation.
A transformation $\tau$ between model $\mathcal{M}$ and $\mathcal{M'}$, considering a set of interventions of interest $\mathcal{I}$, may come in different gradation according to the properties that is satisfies:
A non-transformation: a mapping from $\prod_i \mathcal{M}[X_i] \rightarrow \prod_i \mathcal{M'}[X'_i]$ which is not a function.
A transformation that does not admit (some) interventions: a function $\tau: \prod_i \mathcal{M}[X_i] \rightarrow \prod_i \mathcal{M'}[X'_i]$, such that there exist no $\iota'=\omega(\iota)$, for a $\iota \in \mathcal{I}$, that applied to the transfromed base model $\mathcal{M'}$ would produce the transformed intervened model $\tau(\mathcal{M}_\iota)$. See Example (II).
A transformation without a surjection between interventions: a function $\tau: \prod_i \mathcal{M}[X_i] \rightarrow \prod_i \mathcal{M'}[X'_i]$, such that there is a $\omega(\iota): \mathcal{I} \rightarrow \mathcal{I'}$, with $\omega$ non-surjective. This means that the target set of interventions is somehow overspecified; there are some interventions in the transformed model that have no corresponding intervention in the base model. Non-surjectivity may be solved by restricting the set of interventions of interest $\mathcal{I'}$ in the transformed model, if feasible.
A transformation that does not preserve the order of interventions: a function $\tau: \prod_i \mathcal{M}[X_i] \rightarrow \prod_i \mathcal{M'}[X'_i]$, such that there is a surjective $\omega(\iota): \mathcal{I} \rightarrow \mathcal{I'}$ where the order of the poset $\mathcal{I}$ is not preserved by the poset $\mathcal{I'}$. See Example (IV).
An exact transformation: a function $\tau: \prod_i \mathcal{M}[X_i] \rightarrow \prod_i \mathcal{M'}[X'_i]$, such that there is a surjective, order-preserving $\omega(\iota): \mathcal{I} \rightarrow \mathcal{I'}$. See Example (III) and Example (I).
We now recall the definition of abstraction from [Rischel2020], as we used it in the notebook Abstraction Mapping.ipynb.
An abstraction from a low-level base model $\mathcal{M}$ to a high-level abstracted model $\mathcal{M}'$ is defined by two parts:
A variable-level mapping $a$: a surjective map $a: R \rightarrow \mathcal{X}_{\mathcal{M}'}$, where $R \subseteq \mathcal{X}_\mathcal{M}$, mapping the set $R$ of relevant (endogenous) variables of $\mathcal{M}$ to the (endogenous) variables of $\mathcal{M}'$ surjectively, so that all the (endogenous) variables in $\mathcal{M}'$ have a pre-image in $\mathcal{M}$;
A collection of domain-level mappings $\alpha_{X'}$: $\forall X \in \mathcal{X}_{\mathcal{M}'}$, a surjective function $\alpha_{X'}: \mathcal{M}[a^{-1}(X')] \rightarrow \mathcal{M}'[X']$; that is, for every variable $X'$ in $\mathcal{M}'$ with associated set $\mathcal{M}'[X']$, there is a surjective mapping from the set $\mathcal{M}[a^{-1}(X')]$ associated with the pre-image of $X'$ in $\mathcal{M}$ along $a$. In other words, any outcome of the variable $X'$ in $\mathcal{M}'$ is reachable from the pre-image of variables picked in $\mathcal{M}$ by $a$.
Let us present a first informal comparison between transformations and abstractions. How do these two ideas relate? We will highlight similarities and differences, and then we will present some simple examples that illustrate how transformations and interventions may relate.
Let us analyze how transformations and abstractions compare from different points of view.
The first important difference is the object to which transformations and abstractions are applied.
Abstractions are concerned SCMs that are acyclic and finite. Transformations deal with SEMs that drop these requirements.
Transformations consider only a limited set of interventions of interest. Abstractions do not have an explicit set of interventions, but are implicitly defined only in relation to the finite set of interventions allowed by the finite sets over which the variables are defined.
A second important difference revolves around the level of detail of the definition.
A transformation $\tau$ is defined from the joint domain of the outcomes of all the variables of the base model to the codomain of the joint outcomes of all the variables in the transformed model:
$$\tau: \prod_i\mathcal{M}[X_i] \rightarrow \prod_i\mathcal{M'}[X'_i]$$This mapping is monolithic and robust (it considers all the possible outcomes of the joint base distribution), but also quite rough (it does not explicitly specificy and differentiate how variables in the base model relates to variables in the transformed model).
The domain-level mapping $\alpha_{X'}$, instead, is defined from the domain of outcomes of a specific subset of variables of the base model to the codomain of outcomes of a single variable in the transformed model:
$$\alpha_{X'}: \mathcal{M}[a^{-1}(X')] \rightarrow \mathcal{M}'[X']$$This mapping is more complex and involved (it is indeed a collection of mappings), but also more detailed (it explicitly specifies which variables are relevant, how they and their domains are mapped).
In a very loose way the difference between working with the transformation $\tau$ and the domain-level mapping $\alpha_{X'}$ is like working with a complete joint distribution or its factorization: the joint distribution carries all possible information (including many possible factorizations), while a factorization is more specific and lightweight (addressing one specific instance).
An abstraction clearly defines relevant variables in its set $R$. Irrelevant variables are dropped and will not have a role in further specifying the abstraction.
A transformation does not explicitly allow for the selection of relevant variables. However, the mapping $\tau$ may practically make a variable irrelevant. Let the transformation be $\tau: \mathcal{M}[X_1] \times ... \times \mathcal{M}[X_i] \times ... \mathcal{M}[X_n] \rightarrow \prod_i\mathcal{M'}[X'_i]$; if $\forall x_j,x_k \in \mathcal{M}[X_i]$ $\tau(X_1,...,X_i=x_j,...X_n) = \tau(X_1,...,X_i=x_k,...X_n)$ then the variable $X_i$ is virtually irrelevant for the transformation. Notice, however, that this a notion of relevance tied to the transformation. A virtually irrelevant variable is not practically dropped, and virtual irrelevance does not mean that a variable with the same name may not appear in the transformed model.
An abstraction clearly defines a variable-level mapping as a well-behaved surjective function between relevant variables and variables in the abstracted model.
A transformation does not provide a explicit mapping between variables, instead it maps the joint domain of the outcomes of the variables in the base model to the codomain of the outcomes of the variables in the transformed model. In a loose sense, with respect to the variables, the mapping is formally surjective: it is a one-to-one mapping between the base joint and the transfromed joint. Practically though, there are no constraint on how many variables may figure in the joint of the transformed model. It is possible that the transformed joint is defined over a set of more variables than the set of variables of the base model $\mathcal{M}$. In general the relantionship between variables in the two models is less regulated.
An abstraction clearly defines domain-level mappings as well-behaved surjective functions between outcomes of base variable(s) and the outcomes of abstracted variables.
A transformation, once again, does not work with specific (set of) variables; it just provide a single mapping between the joint domain of the outcomes of the base variables the joint codomain of the outcomes of the transformed variables. Moreover, this mapping is required to be surjective: the codomain of transformed outcomes may contain more element that the domain, either because the transformed model is defined over more variables or because the sets of outcomes of transformed variables are larger than those of base variables. Again, the relantionship of transformation seems less regulated, and sometimes defying the intuition of abstraction.
An ensuing difference concerns the number of possible mappings.
A transformation $\tau$ has no implicit constraint, and it theoretically allows for $|C|^{|D|}$ possible functions, where $|D|$ denotes the cardinality of the domain given by the Cartesian product of the domain of each variable in the base model, and $|C|$ denotes the cardinality given by the Cartesian product of the domain of each variable in the transformed model.
An abstraction is more constrained. Although there is wide degree of freedom in selecting the set of relevant variables ($R$), how base variables are mapped to transformed variables ($a$), and how the domains of base variables are mapped to the domains of abstracted variables ($\alpha_{X'}$), this detailed construction together with explicit constraints such as surjectivity, limits the number of feasible options.
This, in a very loose way again, follows intuitively from the fact that a transformation can freely work with the joint distributions, while an abstractions works more explicitly with the factorization of the joint.
We can also re-express the differences discussed above in terms of types of contraints and their locus.
The definition of a transformation $\tau$ is very loose, just setting up a mapping between random variables, with few explicit constraints. Transformations may thus include mappings that may not be well-behaved and that we may hardly conceptualize as abstractions. To regularize and exclude non-desirable transformations, the idea of exact transformation is presented. By considering the constraints introduced by the structure of the set of interventions of interest, the subset of exact transformation is significantly shrunken with respect to the set of all possible transformations.
On the other hand, the definition of an abstraction introduces from the beginning several constraints on $(R,a,\alpha_X)$. The very definition rules out cases that do not fit the requirements of $(R,a,\alpha_X)$ from the start, allowing us to focus on relevant forms of abstraction.
In the perspective described above, we could argue that interventions play different roles.
Assuming we deal with SCMs with a finite number of variables each with a finite domain, it is well known that a SCM is a shorthand (a presentation?) for a finite set of models that can be generated from the base model via intervention. This set, as discussed above, can be given a poset structure.
In the case of transformations, the set of interventions of interest is used to regularize the definition of transformations by selecting a subset of intervenend models (or joint probability distributions associated with the intervened models) whose ordering in the poset must be preserved under transformation. Thus interventions are instrumentally useful to define exact transformations. The set of interventions is the tool used to assess qualitatively the consistency of a transformation.
In the case of abstractions, no set of interventions of interest is explicitly defined, and all possible interventions are taken under consideration. Instead of restricting the definition via interventions, interventions are used to evaluate the degree of approximation of an abstraction. Interventions are the tool used to assesed quantitatively the consistency of an abstraction.
Both transformations and abstractions confer great relevance to interventions, and require the existence of interventions both at high-level and low-level.
For exact transformations, the surjective map $\omega$ guarantees that all low-level interventions in $\mathcal{I}$ are mapped by funcionality and all high-level interventions $\mathcal{I'}$ are mapped by surjectivity. Yet, the mapping is not necessarily injective, so multiple low-level interventions can be mapped to the same high-level intervention.
For abstractions, the mapping $\alpha_{X'}$ between the low-level domain and high-level domain automatically guarantees the existence of a high-level intervention as a composition of low-level intervention and abstraction. Thus, all low-level interventions have a corresponding high-level intervention, and all high-level interventions may be traced back to at least one low-level intervention by the surjectivity of $\alpha_{X'}$. Again, multiple low-level interventions can be mapped to the same high-level intervention.
Both transformations and abstractions come down to work with (interventional) distributions. However their focus is slightly different.
Transformations consider joint interventional distributions: they answer the question whether, given a base model, commuting intervention and abstraction would produce the same distribution. Abstractions focus on sets of conditional distributions (conceptually the ones defining the factorization of the model): they answer the question whether, given an intervention, commuting mechanism and abstraction would produce the same distribution.
The comparison between the commuting diagram of a transformation and an abstraction may be useful, but it is inevitably partial and hand-wavy. This is due to the fact that the two diagrams represents different objects and morphisms.
The transformation diagram has distributions as objects and interventions/transformations as morphisms. However, even if we talk in the category-theory terms, no category has been specified in which these diagrams live. On the other hand, abstraction diagrams rigorously live in $\mathtt{FinStoch}$; their objects are sets and their morphisms are stochastic matrices.
To setup a proper comparison, we could explore how to represent both abstractions and transformations in $\mathtt{FinStoch}$.
Given the caveat above, let us take a closer look at what the exact transformations and zero-error abstractions guarantee.
An exact transformation guarantees that the transformed intervened distribution $\tau(P_{\mathcal{M}_{\iota}})$ is identical to the intervened transformed distribution $P_{\mathcal{M'}_{\iota'}}$. In diagram notation:
$$ \begin{array}{ccc} P_{\mathcal{M}} & \overset{\iota}{\rightarrow} & P_{\mathcal{M}_{\iota}}\\ \sideset{\tau}{}\downarrow & & \sideset{}{\tau}\downarrow\\ P_{\mathcal{M}'} & \overset{\iota'}{\rightarrow} & \tau\left(P_{\mathcal{M}_{\iota}}\right)=P_{\mathcal{M}'_{\iota'}} \end{array} $$Thus, for all $\iota \in \mathcal{I}$ and $\iota' = \omega(\iota)$:
$$ \tau(P_{\mathcal{M}} (X_1, X_2, ... | do(X_a = x_a, X_b = x_b ...)) ) = P_{\mathcal{M'}} ( \tau (X_1, X_2, ...) | do(X'_a = x'_a, X'_b = x'_b ...)), $$where:
The result means that, either intervening-transforming or transforming-intervening, we obtain the same joint distribution over all the variables of the transformed model.
A zero-error abstraction guarantees that for every $X'_1,X'_2 \in \mathcal{X}_\mathcal{M'}$ the following diagram commutes:
$$ \begin{array}{ccc} \mathcal{M}_{do}\left[\alpha_{X'_{1}}^{-1}(X'_{1})\right] & \overset{\mathcal{M}_{do}\left[\tilde{\phi}_{\alpha_{X'_{2}}^{-1}(X'_{2})}\right]}{\rightarrow} & \mathcal{M}_{do}\left[\alpha_{X'_{2}}^{-1}(X'_{2})\right]\\ \sideset{\alpha_{X'_{1}}}{}\downarrow & & \sideset{}{\alpha_{X'_{2}}}\downarrow\\ \mathcal{M'}_{do}[X'_{1}] & \overset{\mathcal{M'}_{do}\left[\phi_{X'_{2}}\right]}{\rightarrow} & \mathcal{M'}_{do}[X'_{2}], \end{array} $$which means:
$$ \alpha_{X'_{2}}( P_\mathcal{M}(X_1, X_2, ... \vert do(X_a = x_a, X_b = x_b ...)) ) = P_{\mathcal{M'}} ( X'_2 \vert \alpha_{X'_{1}}(do(X_a = x_a, X_b = x_b ...))), $$where:
The result means that, either abstracting-mechanism or mechanism-abstracting, we obtain the same distribution between the considered variables.
Notice the partial parallelism in the formulas, especially in the application of $\tau$ and $\alpha_{X'_{1}},\alpha_{X'_{2}}$. This is due to a non-alignment between the commuting diagrams we are trying to compare.
Notice also, once again, how transformations are concerned in dealing with joint distribution, while abstractions deal with their factorization in conditionals.
In sum, exact transformations and zero-error abstractions play similar roles, but:
It may be interesting considering what would be a trivial case of the above commuting diagrams, representing a sort of basic case with null-interventions.
For exact transformations, if we have $\emptyset \in \mathcal{I}$ and $\emptyset = \omega(\emptyset)$, we get the following diagram:
$$ \begin{array}{ccc} P_{\mathcal{M}} & \overset{\emptyset}{\rightarrow} & P_{\mathcal{M}}\\ \sideset{\tau}{}\downarrow & & \sideset{}{\tau}\downarrow\\ P_{\mathcal{M}'} & \overset{\emptyset}{\rightarrow} & \tau\left(P_{\mathcal{M}}\right)=P_{\mathcal{M}'} \end{array} $$implying
$$ \tau(P_{\mathcal{M}} (X_1, X_2, ...) ) = P_{\mathcal{M'}} ( \tau (X_1, X_2, ...) ) $$that is, the transformation $\tau$ maps the joint of the base model to the joint of the transformed model.
For zero-error abstractions, we consider for the set $X'_1$ the singleton set; then we get the following diagram:
$$ \begin{array}{ccc} \{*\} & \overset{\mathcal{M}'\left[\tilde{\phi}_{\alpha_{X'_{2}}^{-1}(X'_{2})}\right]}{\rightarrow} & \mathcal{M}\left[\alpha_{X'_{2}}^{-1}(X'_{2})\right]\\ \sideset{_{id}}{}\downarrow & & \sideset{}{_{\alpha_{X'_{2}}}}\downarrow\\ \{*\} & \overset{\mathcal{M}'\left[\phi_{X'_{2}}\right]}{\rightarrow} & \mathcal{M}'[X'_{2}] \end{array} $$where $\mathcal{M}'\left[\tilde{\phi}_{\alpha_{X'_{2}}^{-1}(X'_{2})}\right]$ and $\mathcal{M}'\left[\phi_{X'_{2}}\right]$ are the marginals for $\alpha_{X'_{2}}^{-1}(X'_{2})$ and $X'_{2}$. This implies:
$$ \alpha_{X'_{2}}( P_\mathcal{M}(X_1, X_2, ...) ) = P_{\mathcal{M'}} ( X'_2 ), $$that is, under a null transformation (the only available with a singleton set), the marginal of the abstracted variable $X'_2$ is equal to the abstraction of the marginal of the variables under $\alpha^{-1}_{X'_2}(X'_2)$.
We now take a look at some illustrative cases where transformations and abstractions may or may not coincide.
Let us start with our usual case, in which it is simple to show that transformation and abstraction are identical. This is the case of Example (I) and Example (III).
Let us recall the setup:
We have before analyzed:
It is immediate to notice that $\tau$:
and the set of $\alpha_{X'}$:
express the same transformation/abstraction. Indeed in Example (I) and Example (III) we have seen how $\tau$ and $\alpha_{X'}$ define an identical transformed/intervened model $\mathcal{M'}$ (same variables, same domains for the variables, same stochastic matrices).
Let us now consider a simple case in which we can not find an equivalent abstraction for a given transformation.
We will always start from our standard base model $\mathcal{M}$ and transformed/abstracted model $\mathcal{M'}$:
Let us now suppose we want to consider the following transformation $\tau$:
This is a perfectly legitimate transformation, as it matches the (constraint-light) definition of transformation. We could give it the following interpretation: the transformed model will have the same value of C (cancer) as the base model; moreover, since almost everyone smoke we will always set S (smoke) to 1. This interpretation/transformation is of course very debatable (starting from the fact that we simplify our abstract model on the base of a data observation a posteriori), but it is not mathematically wrong.
Notice, though, that $\tau$ is not surjective on the outcomes. Outcomes $(0,*)$ in $\mathcal{M'}$ do not have an pre-image along $\tau$. If we try to find a corresponding abstraction for this transformation we are bound to fail. With the given model $\mathcal{M'}$, the surjective variable-level mapping $a$ must define a one-to-one mapping between relevant low-level variables ($S,C$) and high-level variables ($S,C$). Then we would not be able to find a surjective $\alpha_S: \mathcal{M}[S] \rightarrow \mathcal{M'}[S]$ or $\alpha_S: \mathcal{M}[C] \rightarrow \mathcal{M'}[S]$ from the domain of $C$ or $S$ in the base model to the domain of $S$ in the abstracted model.
The non-existence of such an abstraction may be remedied by changing the very domain $\mathcal{M'}[S]$ from $\{0,1\}$ to $\{1\}$. However, this would require to directly change the abstracted SCM $\mathcal{M'}$ which may be given, and which we may not be authorized to change.
Let us now see an alternative case in which, for a different reason, we can not find an equivalent abstraction for a given transformation.
We will always start from our standard base model $\mathcal{M}$ and transformed/abstracted model $\mathcal{M'}$:
But now we will consider the following transformation $\tau$:
This is again a perfectly legitimate transformation, as it matches the definition of transformation. We interpret this transformation as follows: the transformed model will have the same value of C (cancer) as the base model; moreover, since C (cancer) and S (smoke) are tightly coupled, we will set them to the same value. Of course, this interpretation/transformation is again very debatable, but it does not compromise its mathematical legitimacy.
As before $\tau$ is not surjective with respect to the joint codomain, but this is not our main concern now. With the given model $\mathcal{M'}$, the surjective variable-level mapping $a$ must define a one-to-one mapping between relevant low-level variables ($S,C$) and high-level variables ($S,C$). Although $\tau$ itself is not surjective, the domain level function $\alpha_S$ and $\alpha_C$ may in principle be made surjective as both outcomes $0$ and $1$ for $S$ and $C$ are in the co-domain of $\tau$.
We encounter here another problem, though. The fact is that only the low-level variable $C$ exercise its influence on both the high-level variables $S$ and $C$. In other words, the outcomes of the high-level model in terms of $S$ and $C$ are completely determined by the low-level variable $C$. To all effects, the low-level variable $S$ is irrelevant, although not explicitly defined as such by the set $R$. We are then unable to find two surjective functions $\alpha_S$ and $\alpha_C$ with domains $\mathcal{M}[S]$ and $\mathcal{M}[C]$ that have the same behaviour as the transformation $\tau$.
The non-existence of an abstraction may be remedied in this case by changing the set of relevant variables $R$. Once again, though, this would affect the definition of abstracted SCM $\mathcal{M'}$, which would switch from being defined over two endogenous variables $\mathcal{X}_\mathcal{M'}=\{S,C\}$ to a single variable $\mathcal{X}_\mathcal{M'}=\{C\}$. Such a change transform the problem and may not be allowed.
Let us now consider a final negative example in which we are unable to find an equivalent abstraction for a given transformation.
We always use the standard base model $\mathcal{M}$ and transformed/abstracted model $\mathcal{M'}$:
We now tweak the previous transformation to make sure that the variable $S$ is not virtually irrelevant:
Again $\tau$ is a function complying with the definition of transformation. We could try to interpret this transformation as follows: the transformed model will have the same value of C (cancer) as the base model; moreover, since C (cancer) and S (smoke) are tightly coupled, we will set them to the same value, except in one case. It is important that smoking patients with no cancer are preserved. It is very debatable if such a transformation should be considered an abstraction: the finicky details in considering case by case in a very concrete fashion may raise the question if we are performing an abstraction at all. Yet, the transformation has its mathematical legitimacy.
We will soon see that this transformation does not have a corresponding abstraction. As in the previous example, the non-surjectivity of $\tau$ is not our main problem. Nor is one of the two low-level variables ($S,C$) irrelevant, because, at least in one case, knowledge of both is required to define the value of the high-level variables ($S,C$).
The problem here, instead, is that the low-level variable $C$ 'splits' its influence on both the high-level variables $S$ and $C$. The low-level variable $S$ is not irrelevant, as it is necessary to determine the outcome of the high-level variable $S$. But what is really important is the low-level variable $C$ that is necessary to determine the outcome of both the high-level variables $S$ and $C$. Because of the functionality of $a$ which assigns a single low-level variable to one high-level variable, we can not find an abstraction that behaves like the above transformation $\tau$.
Remedying this problem is more complicated, because it clashes with the idea of simplification underlying the definition of abstraction: we can not have a low-level variable that gets abstracted into multiple high-level variables.
Let us perform a quick computation over the number of possibilities available when we consider transformations and abstractions.
Let us suppose we are given the usual models we considered in Example (I), that is:
Let us see how we could define a transformation or an abstraction between them.
Transformation. Let us define a transformation $\tau$ from $\mathcal{M}$ to $\mathcal{M'}$.
The transformation is defined as $\tau: \prod_i\mathcal{M}[X_i] \rightarrow \prod_i\mathcal{M'}[X'_i]$, which, in our case correspond to a mapping $\tau: \{0,1\}^3 \rightarrow \{0,1\}^2$. Denoting $|D|$ the cardinality of the domain and $|C|$ the cardinality of the codomain, we have $|C|^{|D|}$ possible functions $\tau$; in our case, this amounts to $4^8$.
Abstraction. Let us do the same analysis for an abstraction from $\mathcal{M}$ to $\mathcal{M'}$.
First of all, we need to define our set of relevant variables $R$. The cardinality $|R|$ is constrained to be less or equal to $|\mathcal{X}_\mathcal{M}|$ (we can not select more variables than the ones available in the base model) and greater or equal to $|\mathcal{X}_\mathcal{M'}|$ (we need a surjective mapping between variables). The number of possible choices is given by all combinations: $\sum_{i=|\mathcal{X}_\mathcal{M'}|}^{|\mathcal{X}_\mathcal{M}|} \binom{|\mathcal{X}_\mathcal{M}|}{i}$. In our case $\binom{3}{2} + \binom{3}{3} = 4$: $\{ \{S,T\},\{S,C\},\{T,C\},\{S,T,C\} \}$.
Say that we pick the usual case where $R=\{S,C\}$.
Next, we need to come up with the surjective mapping $a: R \rightarrow \mathcal{X}_{\mathcal{M}'}$. Surjectivity adds a constraint to our choice: all the elements of the codomain must be images of the mapping. The number of choices depends on the cardinality $|R|$; if $|R|=|\mathcal{X}_\mathcal{M'}|$ then we have $|R|!$ choices. In our case that is $2! = 2$: $\{ \{S \mapsto S$, $C \mapsto C\},\{S \mapsto C$, $C \mapsto S\} \}$.
We will pick the trivial function $a$ mapping each variable to the variable with the same name: $S \mapsto S$, $C \mapsto C$.
Finally, we need to provide a list of surjective mappings $\alpha_{X'}: \mathcal{M}[a^{-1}(X')] \rightarrow \mathcal{M}'[X']$. Specifically we have to provide a number of mappings equal to $|\mathcal{X}_\mathcal{M'}|$; for each one of these mappings we have a number of choices dependent on the cardinality of domain $\mathcal{M}[a^{-1}(X')]$ and codomain $\mathcal{M'}[X']$ under the constraint of surjectivity. In our case we have $2$ functions to specify: $\{ \alpha_{S}, \alpha_{C} \}$ , each with $2! = 2$ possibilities: $\{ \{0 \mapsto 0$, $1 \mapsto 1\},\{0 \mapsto 1$, $1 \mapsto 0\} \}$.
Again, we will pick the standard identity mappings for both $\alpha_S$ and $\alpha_C$.
Comparison. In sum, in our example, we could choose between $4^8 = 65536$ legitimate transformations, while our choice for abstraction was more limited:
$$ \underbrace{1}_{\begin{array}{c} \textrm{way of}\\ \textrm{choosing}\\ \textrm{3 relevant}\\ \textrm{variables} \end{array}}\cdot\underbrace{2!\binom{3}{2}}_{\begin{array}{c} \textrm{way of}\\ \textrm{choosing}\\ \textrm{surj map }a\\ \textrm{with }|dom|=3\\ \textrm{and }|cod|=2 \end{array}}\cdot\underbrace{\left(2!\binom{4}{2}+2!\binom{2}{2}\right)}_{\begin{array}{c} \textrm{way of choosing}\\ \textrm{one surj map }\alpha\textrm{ with}\\ |dom|=2^{2}\textrm{ with and }|cod|=2^{1}\\ \textrm{and one surj map }\alpha\textrm{ with}\\ |dom|=2^{1}\textrm{ with and }|cod|=2^{1} \end{array}}+\underbrace{3}_{\begin{array}{c} \textrm{ways of}\\ \textrm{choosing}\\ \textrm{2 relevant}\\ \textrm{variables} \end{array}}\cdot\underbrace{2!\binom{2}{2}}_{\begin{array}{c} \textrm{way of}\\ \textrm{choosing}\\ \textrm{surj map }a\\ \textrm{with }|dom|=2\\ \textrm{and }|cod|=2 \end{array}}\cdot\underbrace{2\left(2!\binom{2}{2}\right)}_{\begin{array}{c} \textrm{ways of choosing}\\ \textrm{two surj maps }\alpha\textrm{ with}\\ |dom|=2^{1}\textrm{ with and }|cod|=2^{1} \end{array}}=180 $$The higher number of possible transformations seems consistent with the examples we saw before, in which specific transformations could not be expressed as abstractions.
We have compared transformations and abstractions from several points of view:
Property | Transformation | Abstraction |
---|---|---|
Domain (model) | Semi-Markovian SCM | Finite, cyclic semi-Markovian SCM |
Domain (interventions) | Set of interventions | All possible interventions (finite) |
Level of detail | Joint distribution (model) | Factorized marginals (mechanisms) |
Relevant variables | Virtually irrelevant variables | Explicitly irrelevant variables |
Variable-level mapping | Unrestricted | Constrained: surjective |
Domain-level mapping | Unrestricted | Constrained: surjective |
Cardinality of possibilities | Large (joint) | Small (factorizations) |
Type/locus of constraints | Order-preserving map of interventions (not in definition of transformation) | Restrictions and surjectivity (in definition of abstraction) |
Role of interventions | Regularize and restrict the analysis of exactness | Evalute abstraction error |
Existence of interventions at low- and high-level | Guaranteed by surjective $\omega$ | Guaranteed by surjective $\alpha_{X'}$ |
Focus on distribution | Joint interventional distribution | Mechanism distribution under intervention |
Commuting diagrams | Distributions and measurable functions | Sets and stochastic matrices |
Commutativity | Intervention and abstraction (sample) | (intervene) Abstract and sample |
Trivial commutativity | Identity of joints under transformation | Identity of marginals under abstraction |
Transformation allows for more loose mappings between models, some of which may escape our intuition of abstraction or not behave well. An abstraction, instead, has a more detailed definition that constrains our choices but also allows for mappings that better express our understanding of abstraction. Sometimes, transformation and abstraction may coincide (see Example (V)). More frequently we may find no correspondance, such as when we have:
In general, there are more allowed transformations than allowed abstractions (see Example (IX)).
We have concluded that exact transformations and zero-error abstractions provide different guarantees:
Zero-error abstractions guarantee the identity of:
Exact transformations guarantee the identity of
(Notice that when we say intervention on all the variables we do not mean that the intervention acts on all the variables, but that the mapping $\iota$ has as domain the set of all the variables.)
This analysis has provided a first thorough although informal comparison between abstractions and transformations. A more rigorous alignment of the two concepts and an evaluation of when exact transformations and zero-error abstractions are identical is left for another notebook.
In this notebook we have presented a different approach to relating models at different levels of granularity. After analyzing the idea of abstraction (from [Rischel2020]) in the previous two notebooks, here we have focused on the alternative notion of transformation (from [Rubenstein2017]). We have also provided an initial informal comparison between the two definitions, trying to highlight similarities and differences in the two approaches. A number of examples were used to illustrate the new concept of transformation and how it relates to abstraction.
[Rischel2020] Rischel, Eigil Fjeldgren. "The Category Theory of Causal Models." (2020).
[Rubenstein2017] Rubenstein, Paul K., et al. "Causal consistency of structural equation models." arXiv preprint arXiv:1707.00819 (2017).
[Pearl2009] Pearl, Judea. Causality. Cambridge university press, 2009.
[Peters2017] Peters, Jonas, Dominik Janzing, and Bernhard Schölkopf. Elements of causal inference: foundations and learning algorithms. The MIT Press, 2017.
[Spivak2014] Spivak, David I. Category theory for the sciences. MIT Press, 2014.
[Fong2018] Fong, Brendan, and David I. Spivak. "Seven sketches in compositionality: An invitation to applied category theory." arXiv preprint arXiv:1803.05316 (2018).